# CausalPlan: Empowering Efficient LLM Multi-Agent Collaboration Through Causality-Driven Planning

## Overview

**CausalPlan** is a framework designed to enhance collaboration between agents powered by large language models (LLMs) through causal reasoning and structured planning. It builds on the foundation of the [ProAgent framework](https://github.com/PKU-Alignment/ProAgent) ([Zhang et al., 2024](https://arxiv.org/abs/2308.11339)), incorporating causal graphs and structured action planning to improve coordination and decision-making in multi-agent tasks.

---

## Installation & Setup

We recommend using a Conda environment with Python 3.7, as many dependencies (especially TensorFlow 1.x) are not supported in later versions.

### Step 1: Setup Environment

```bash
# Create and install via environment file
conda env create --file environment.yaml
```

### Step 2: Install Dependencies


```bash
    cd ./lib/overcooked_ai
    pip install -e .

    
    cd ./lib/stable_baselines
    pip install -e .
```



### Step 3: Download the baselines 
This code supports the following baselines: 
- **Self-Play (SP)**
- **Population-Based Training (PBT)**
- [**Fictitious Co-Play (FCP)**](https://arxiv.org/abs/2110.08176)
- [**Maximum Entropy Population-Based Training (MEP)**](https://arxiv.org/abs/2112.11701)
- [**Cooperative Open-Ended Learning (COLE)**](https://arxiv.org/abs/2302.04831)
- **Human policies** are not included in this codebase, as they require a different virtual environment setup.


Pretrained models for these algorithms can be downloaded directly from this [Google Drive folder](https://drive.google.com/drive/folders/1s88a_muyG6pVlfcKDKop6R1Fhxr8dcGH). Please follow this layout mapping:

```
PYTHON_LAYOUT_NAME_TO_ENV_NAME = {
    "unident_s": "Asymmetric Advantages",
    "simple": "Cramped Room",
    "random1": "Coordination Ring",
    "random0": "Forced Coordination",
    "random3": "Counter Circuit"
}
```
The pretrained models need to be saved in .\CausalPlan\models
### Step 4: How to Run

#### Add API Key 
Create a file named `secrets.env` inside the `./proagent/` directory and add your Cohere API key:

```env
API_KEY=your_cohere_api_key_here
```

Alternatively, if you prefer not to use Cohere, you can modify the ```--gpt_explainer_model``` argument to use any open-source language model of your choice. However, please note that the reported results and timing analysis are not guaranteed to hold under alternative models.

#### Collect Data 

To collect data, run `main_causal.py` with the following command:

```
python main_causal.py \
  --p0 "MEP" \
  --p1 "MEP" \
  --collect_data p0 \
  --save_buffer True \
  --record_action_only True \
  --cg False \
  --fh False \
  --horizon 200000 \
  --train_SCM False \
  --bp "output_file_with_goal_and_op_and_rew_200k_cr_true_act_only_p0.pt" \
  -l "cramped_room"
```

- Use --p0 and --p1 to specify the agents used for data collection.

- Use --collect_data to indicate which agent the data should be collected for.

- Use -l to specify the layout.

- The generated buffer will be saved in the ./data directory.



#### Train SCA
To train the causal graph, run:


```
python main_causal.py \
  --save_buffer False \
  --train_SCM True \
  --train_SCM_step 50000 \
  --bp "output_file_with_goal_and_op_and_rew_200k_cr_true_act_only_p0.pt" \
  --cgp "edge_params_with_action_after_200k_train_50k_test_w_op_final_cr_act_only_p0.pt" \
  -l "cramped_room"
```
 - This will learn the causal structure from the data and save the resulting edge parameters (causal matrix) to the ./data directory.
#### Train LLMs Agent
Run `./src/main_causal.py` to infer using the LLMs. For example

```
python main_causal.py \
  --p0 "COLE" \
  --p1 "ProAgent" \
  --cg True \
  --fh True \
  --seed 0 \
  --horizon 400 \
  --K 1 \
  --tunning True \
  --gamma 0.5 \
  --save_buffer False \
  --train_SCM False \
  --gpt_planner_model "meta-llama/Meta-Llama-3-8B-Instruct" \
  -l "cramped_room"

```
Tunning can be set to ```True``` or ```False``` (if set to ```True``` we will manually use a ```gamma=0.5``` value), the results will be saved in different dir only ```./experiments``` or ```./experiments_tuning```

#### Run Baselines
```
python main_causal.py 
  --p0 "FCP" 
  --p1 "FCP" 
  --seed 0 
  --horizon 400 
  -l 'asymmetric_advantages' 
```

## Note: 
- For clarity in our manuscript, we have renamed the factorization states and actions to make them more descriptive and consistent with the Overcooked-AI environment. The mapping follows a structured naming convention where the suffix 1 refers to the controlling agent, and 2 refers to the other agent. Below, we summarize the key mappings used:
```
empty_hand → empty_hand1 / empty_hand2: The agent is not holding any object.

hold_onion → hold_onion1 / hold_onion2: The agent is holding an onion.

hold_dish → hold_dish1 / hold_dish2: The agent is holding an empty dish.

dish_with_soup → dish_with_soup1 / dish_with_soup2: The agent is holding a dish filled with soup.

pot_0 to pot_3 → pot0_0 to pot3_0: The pot contains 0 to 3 onions respectively.

pot_finished → pot_finished_0: The pot has finished cooking and soup is ready.

pot_1_0 to pot_1_3 → pot0_1 to pot3_1: The second pot contains 0 to 3 onions respectively.

pot_1_finished → pot_finished_1: The second pot has finished cooking.

goal_delivered → goal_delivered: A soup has been successfully delivered to the goal.

Actions are renamed as follows:

pickup(onion) → pickup_onion

put_onion_in_pot() → put_onion_in_pot

pickup(dish) → pickup_dish

fill_dish_with_soup() → fill_dish_with_soup

deliver_soup() → deliver_soup

place_onion_on_counter() → place_onion_on_counter

place_dish_on_counter() → place_dish_on_counter
```
- While we have made an effort to keep the code as readable as possible, its complexity stems from being built on top of an existing framework. A cleaner and more modular version of the code will be made available upon publication.


## Citation

```bibtex
@inproceedings{zhang2024proagent,
  title={ProAgent: building proactive cooperative agents with large language models},
  author={Zhang, Ceyao and Yang, Kaijie and Hu, Siyi and Wang, Zihao and Li, Guanghe and Sun, Yihang and Zhang, Cheng and Zhang, Zhaowei and Liu, Anji and Zhu, Song-Chun and others},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  pages={17591--17599},
  year={2024}
}

@inproceedings{li2023cooperative,
  title={Cooperative open-ended learning framework for zero-shot coordination},
  author={Li, Yang and Zhang, Shao and Sun, Jichen and Du, Yali and Wen, Ying and Wang, Xinbing and Pan, Wei},
  booktitle={Proceedings of the International Conference on Machine Learning (ICML)},
  pages={20470--20484},
  year={2023}
}

@inproceedings{carroll2019utility,
  title={On the utility of learning about humans for human-{AI} coordination},
  author={Carroll, Micah and Shah, Rohin and Ho, Mark K and Griffiths, Thomas L and Seshia, Sanjit A and Abbeel, Pieter and Dragan, Anca},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  pages={5174--5185},
  year={2019}
}

@inproceedings{strouse2021collaborating,
  title={Collaborating with humans without human data},
  author={Strouse, DJ and McKee, Kevin R and Botvinick, Matt and Hughes, Edward and Everett, Richard},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  pages={14502--14515},
  year={2021}
}

@inproceedings{zhao2023maximum,
  title={Maximum entropy population-based training for zero-shot human-AI coordination},
  author={Zhao, Rui and Song, Jinming and Yuan, Yufeng and Hu, Haifeng and Gao, Yang and Wu, Yi and Sun, Zhongqian and Yang, Wei},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  pages={6145--6153},
  year={2023}
}
```